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Abstract Competing risks model time to first event and type of first 
event. An example from hospital epidemiology is the incidence of hospital- 
acquired infection, which has to account for hospital discharge of non-infected 
patients as a competing risk. An illness-death model would allow to further 
study hospital outcomes of infected patients. Such a model typically relies 
on a Markov assumption. However, it is conceivable that the future course 

q^ . of an infected patient does not only depend on the time since hospital ad- 

mission and current infection status but also on the time since infection. We 
demonstrate how a modified competing risks model can be used for nonpara- 
metric estimation of transition probabilities when the Markov assumption is 

CO ■ violated. 

Keywords: Left-truncation • Bivariate survival • Nosocomial Infection • 
Markov assumption • Multi-state model 



1 Introduction 

A competing risks model considers time to first event and type of first event. 
In real life, one competing event, say event 1, may be intermediate, and it 
could be of interest to investigate subsequent occurrence of event 2. This is 
feasible by extending the competing risks model to an illness-death model. 
The idea is that all individuals are initially subject to the original competing 
risks experiment. For those individuals who had a type 1 event as a first 



1 



event, a second experiment dete rmines the waitin g; time between the type 1 
event and the type 2 event. See iFine et al.l (|200ll ) for a related extension of 
competing risks. 

Both competing risks and illness-death mode l s are, for instance, rele- 
vant in hospital epidemiology ( iBeversmann et all 1201 ll ): Nosocomial, i.e., 
hospital-acquired infections are a major healthcare concern, increasing mor- 
bidity an d mortality, and t hey a re a problem from a health economics per- 



spective. lUmscheid et al.l ( 120111 ) considered preventable nosocomial infec- 



tions and argued that successful prevention could save up to 53,483 lives a 
ye ar in the U.S., with up to $23.44 billion annual cost savings to hospitals. 



Grambauer et al.l ( 120101 ) recently demonstrated that estimating the inci- 



dence of nosocomial infections must account for end of hospital stay without 
prior infection as a competing risk, i.e., direct discharge of a patient prevents 
in-hospital infection. Predicting length of hospital stay for an infected patient 
or predicting the proportion of infected in-hospital patients is relevant for the 
planning of hospital resources, but must account fo r the time-dependen cy of 



the infection status as in an illness-death model (IGraves et al. 



2011( 1. In 



this model, all patients would share one initial state. Infected patients move 
into the intermediate illness state at the time of infection, and end of stay is 
modelled by transitions into the absorbing state. 

The canonical nonparametric estimator of t he transition probabi l ities i n 
these models is the Aalen-Johansen estimator (lAalen and Johansenl . Il978l ). 
The estimator relies on a time-inhomogeneous Markov assumption, which is 
trivially fulfilled for competing risks, but may be violated in an illness-death 
model. In the context of nosocomial infections, the assumption does not 
hold, if the end-of-hospital stay probability of an infected patient depends 
on the time of infection. 

Research for possibly non-Markov models has mostly focused on esti- 
mating state occupation probabilities P(X t = j), where X t denotes the state 
occupied at time t and j is a possible state of the model. Under a Markov as- 
sumption and assuming one initial state occupied by all individuals at time 0, 
say P(X = 0) = 1, estimation may be based on the Aalen-Johansen esti- 
mator of P(X t = j | X = 0). In the absence of a common initial state, the 
Aalen-Johansen estimator of P(X t = j | X = •) would need to be multiplied 
by an estimator of the i nitial state distr i bution . 

For complete data, I Andersen et al.l ( 19931 1 showed that this approach 
equals the usual multinomial estimators which do not rely on a Markov as- 
sumption. A major br eakthrough for data subje ct to random r i ght-ce nsorship 
was then obtained by iDatta and Sattenl (120011 ) and iGliddenl (120021 ) . Datta 
and Satten showed that this Aalen-Johansen approach still consistently es- 
timates the state occupation probabilities in the absence of the Markov 



property, and Gl idden provided weak convergence results. Earlier work of 
Pepe et al.l (J19911 ) had allowed for estimating the probability of an intermedi- 
ate condition in a non-Markov illness-death model. Interestingly, Pepe et al. 
found their estimator to approximately equal the standard Aalen-Johansen 
estimator, somewhat anticipating the subsequent more general results of 
Datta and Satten. 



Datta and Sattenl (120021 ) allowed for non-random ce nsoring by d i rectly 



modelling the censoring haz ard; see also re l ated r esults bv lDatta et al.l ( 120001 ) 
for the illness-death model. iGunnes et al.l (120071 ) discussed the relative mer- 
its of the Aalen-Johansen and the Datta-Satten estimator in terms of bias and 



mean squared error in the presence of dependent censoring. See lDatta and Ferguson 
(120121 ) for an overview. 

A different line of research that could be applied to non-Ma rkov mul- 
tistate models is time-multivariate survival analysis. iGilll ( 119921 ) mentions 
this possibility and gives an insightful discussion on why nonparametric es- 
timation of a multivariate survival function in the presence of multivariate 
censoring is a difficult problem, wh ere the usual counting process approach 



breaks down. iLin and Yingl (119931 ) noted that the difficulties reduce and 



simpler estimation procedures ar e feasible, if censor i ng is univariate. This is 
the case in a multistate model. iTsai and Crowley|(1998) improved on the 
Lin-Ying estimator, and an overview was given by iPrentice et al.l ( 120041 ) . 

The aim of the present paper is to use competing risks techniques for non- 
parametric estimation of transition probabilities in a potentially non-Markov 
illness-death model without recovery. This aim differs from estimating state 
occupation probabilities P(X t = j) in that we do wish to condition on the 
state occupied at time s, s < t. There is a connection to time-multivariate 
survival analysis, because the first estim ator that we will derive is alge- 
braically identical to an earlier proposal by iMeira-Machado et al.l ( 120061 ). To 
the best of our knowledge, the work by Meira-Machado et al. was the first 
paper which focused on using time-multivariate techniques for estimation of 
transition probabilities in a no n-Markov illn ess-death model, employing the 
time-multivariate techniques of IStutd ( 119931 ) . 

We develop the Meira-Machado et al. estimator via a different route, 
which allows for a competing risks explanation on why their estimator works 
in a non-Markov model. We also give a new inverse probability of censor- 
ing weighted (IPCW) representation of the estimator. Using both the new 
IPCW representation and results of ITsai and Crowley! (119981 ) . we derive a 
new, simpler and theoretically more efficient competing risks-type estimator. 
The new estimator gives direct access to competing risks methodology, which 
we demonstrate by also allowing for left-truncation. 

The paper is organized as follows: Section [2] introduces competing risks 



and illness-death models as stochastic processes. The illness-death model 
is also re-parametrized via a bivariate time vector and a further competing 
risks model is derived, which will be crucial for the nonparametric estima- 
tion procedures of Section [3j We report simulation results in Section H] and 
an analysis of real hospital infection data in Section [5j The closing Sec- 
tion offers a discussion, including an appraisal of the relative merits of the 
Meira-Machado et al. estimator and the new competing risks estimator. Our 
conclusion is that both estimators perform comparably, but that the new 
estimator may be preferred due to its computational simplicity. We also find 
that the Aalen-Johansen estimator may perform competitively even if the 
Markov assumption is violated. 

2 Competing risks and illness-death models 

Consider a stochastic process (X u ) ue [o i00 ) with state space {0,1,2}, right- 
continuous sample paths and initial state 0, P(Xq = 0) = 1. For a competing 
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Event of type 1 



Event of type 2 



Figure 1: Competing risks model and illness-death model without recovery. 



risks model with two competing risks, we model — > 1 and — > 2 transitions, 
and states 1 and 2 are absorbing, i.e., there are no transitions out of the 
absorbing states. In the context of nosocomial infections, we will consider 
patients to enter state on admission to hospital. Occurrence of an infection 
is modelled by a — > 1 transition, end of hospital stay without prior infection 
is modelled by a — > 2 transition. 

We may extend this model to an illness-death model without recovery by 
also allowing for 1 — > 2 transitions. This is illustrated in Figure HJ where 
the dashed arrow indicates that 1 — ¥ 2 transitions are only feasible in the 
illness-death model. The addendum 'without recovery' means that 1 — > 
transitions are not modelled. For nosocomial infections, this entails that 
X u = 1 is interpreted as 'in hospital at time u, infection has occurred in (0, u]\ 



This interpretation is in line with the common comparison of infected 'cases' 
and non-infected 'controls' in hospital epidemiology. The interpretation of 
X u = 2 is that hospital stay has ended by time u. 

Also note that the interpretation of states 1 and 2 differs between the 
models. For competing risks, the interpretation of state 1 is 'an infection 
has occurred', while the interpretation of state 2 is 'hospital stay has ended 
without prior infection'. 

Regardless of the model, we may define the time until first event, 

T = mi{u : X u + 0}. (1) 

The type of first event is 

X To e{l,2}, (2) 

the state entered by the process at time T . 

For the illness-death model, we also define the time until absorption (end 
of hospital stay), 

T = inf{u : X u = 2}. (3) 

We have T = T, if the process makes a direct — > 2 transition, and T < T 
otherwise. We assume that the distribution of T has mass on [0, oo) only. 
That is, every individual reaches state 2 (spends a finite time in hospital). 

In the remainder of the paper, we will take (X u ) u to be an illness-death 
model. The aim will be to provide for non-parametric estimation of the 
transition probabilities 

P lj (s,t) = P(X t = j\X s = l), (4) 

where (s, t), s < t, is a fixed, but arbitrary pair of times, / e {0, 1}, j G {1,2}. 
In (J3J), we do not assume that conditioning on X s = 1 is tantamount to 
conditioning on the entire past of the process up to time s. That is, we do 
not assume that (X u ) u is Markov. 

More specifically and for ease of presentation, we will focus on Poi(s,t). 
In the data example, this is the probability of an infected in-hospital patient 
at time t given no infection at time s. This quantity can be used for the 
planning of hospital resources. Our ideas work analogously for the other 
transition probabilities. We express Poi(s,t) in terms of the bivariate time 
vector (T ,T), 

P(X t = l,X s = 0) P(s<T <t,t<T) 
P ° l(M) - P(X s = 0) - P(T > s) • (5) 

The key to the nonparametric estimation procedures in Section [3] are 
both (JHJ) and the following competing risks process (n U ;s,t)u = {k u ) u , which 



is derived from the illness-death process (X u ) u , 





h u:s,t 



K, 



x u e {0, l}, 

X u = 2 and l(s < T < t, t < T) = 1, (6) 

X u = 2 and l(s <T <t,t<T)= 0, 



where l(-) is the indicator function. The competing risks process k stays in 
its initial state until time T. At time T, the value of the competing risks 
mark l(s < T < t, t < T) is known. We have that P(k t = 1) = P(s < T < 
t,t < T). As a consequence, the numerator of the right hand side of ([5]) is 
the limit of the cumulative incidence function for event type 1 of k, 

P(s <T <t,t<T)= lim P(T < u, K T = 1). (7) 

Note that the competing risks process k depends on the fixed, but arbitrary 
pair of times (s, t), s < t, but we are suppressing this in the notation for ease 
of writing. 

3 Nonparametric estimation 

We assume that observation of the illness-death process X, or, equivalently, 
of the random times (T ,T), is subject to random censorship by C. We also 
assume that the support of the distribution of T is contained in the support 
of the distribution of C. This last assumption is needed for estimation of 
the limit of the cumulative incidence function in (J2j). It is justifiable for 
the nosocomial infection example, but may be violated in other settings. In 
the discussion, we expl ain how this assumpt i on ca n be relaxed. We first 



revisit the estimator of iMeira-Machado et al.l ( 120061 ) in Section [31 revealing 



that violations of the Markov assumption can be seen to be handled via a 
competing risks approach and also giving a new IPCW representation of the 
estimator. These two observations are taken further in Section I3.2[ leading 
to a simpler competing risks-type estimator, which in turn also allows for 
left-truncated data as explained in Section 13.31 

3.1 The estimator of Meira-Machado et al. revisited 

For estimation of (0), we use the usual Kaplan-Meier estimator for estimat- 
ing the denominator P(T > s), based on the censored observations of T . 
Because of (|7|), we use the right hand limit of the Aalen-Johansen estimator 
of P(T < u, kt = 1) for estimation of the numerator. To this end, and for the 
competing risks process k, we write Ni for the counting process of observed 



events of type 1, N for the counting process of observed events (of any type), 
and Y for the at-risk process. We also write N Q for the counting process of 
observed replicates of To and Yq for the at risk process of the initial state 
of the illness-death model X. Note that the processes Aq, N and Y depend 
on the fixed pair of times (s, t) through k, but N and Y do not depend on 
(s,t). Then, these estimators are 



«*>.)- n-w 



ve[o,s 
and 



v€[0,u) 

Recall that the right hand side of (j^J) depends on (s,t) via Aq, N and Y. In 
the appendix, we show that the resulting estimator of P i(s,t), 

P 01 (s,t) = P(s < T < t,t < T)/P(T Q > s), (10) 



equals the estimator proposed by iMeira-Machado et al.l (120061 ) , who derived 



their estimator via a different route, using Kaplan- Meier integrals. Note that 
the estimator fllOp is, in general, different from the Aalen-Johansen estimator. 
This is even true for the simple case of s = 0. Here, as a function of t, the 
Aalen-Johansen estimator of Poi(0,t) will change its value whenever there 
is an observed — > 1 transition in the illness-death model. In contrast, 
and assuming no ties, the non-Markov estimator will not change its value 
(as a function of t), if the individual at hand is subsequently censored in 
the intermediate state of the illness-death model. This is so, because Aq 
is the counting process of observed events of type 1 of the competing risks 
process k. The event times of k are the waiting times until absorption of the 
illness-death process. 

We now give a new IPCW representation of the estimator, which we will 
subsequently use to modify and thereby simplify estimation of Pqi{s, t). The 
idea is to express ([9]) in terms of a Kaplan-Meier estimator of the censorin g 
survival function and to then use an observation by lTsai and Crowlevl ( 1998 ), 



who noted that there is more than one such estimator in bivariate time. 

We write N c for the counting process of censoring events, which have 
been observed before absorption. We have that 



AN c (u) + AN(u) + Y(u+) = Y{u) 



where A indicates the increment of the respective processes. As a conse- 
quence, 



dN c (v) \ Y(u) 



Y(v)-AN(v)J F(0)' 



(11) 



uG[0,u) u6[0,u) 

and the estimator in (jSJ) equals 

i>6[0,«) 

Here, J(„ G r 0u ) (1 — ym-anm ) 1S ^ ne Kaplan-Meier estimator of P(C > u), 
based on the censored observations of T. 

3.2 A new competing risks-type estimator 



Tsai and Crowlevl ( 119981 ) observed that there is more than one Kaplan- 
Meier-type estimator of P(C > u), if a bivariate vector of event times such 
as (T ,T) is subject to one censoring variable C. We introduce some addi- 
tional notation: We write Nq for the counting process of censoring events, 
which have been observed before leaving the initial state of the illness-death 
model X. We also write S Y for the at risk process of the competing risks 
model k in the data subset of individuals who were still in the initial state 
of X and under observation at time s. We analogously define S N, s Ni and 
S N C . Then Tsai and Crowley suggested to use the following Kaplan-Meier- 
type estimator of P(C > u), specialized to our setting with To < T, 

TT d dN ° {v) ) TT d dsN ° {v) ) as) 

J{ V Y (v)-AN (v)J J{ V sY(v)-A s N(v)J- { ] 

Replacing ][ ve[0>u) (l - Y (v) N -an(v) ) in (H2D by (USD as an estimator of P(C > 
u), we obtain a different estimator of P(s < T < t,t < T), 



P( s <r„< M <T) = y^J{(i- Ya{v) _ ANo{v 

ve[0,s] 

d s N c (v 



-i 



o 



dNf(v) 

0-A, 



s Y(v) - A s N(v) 

f£(s,n) 



-1 



Because Y(0) = Y (0), Y (s+) = s Y(s+) and (as a consequence of the defi- 
nition of k) Nx = s Nx, this equals 



K 

ve[o,s 



dN (v) 
Y (v) 



v£(s,u) 



d s N(v )\ d s Nx(u) 
,Y(v) 



*Y(u) 



where we have also used an analogous variant of (ITT]) for P(Tq > s) 

~ (-, _ dN (v) \ 

AvG[0,s] ^ Y (v) )■ 

The resulting estimator of P 01 (s,t) is 



Poi{s,t) 



P(s<T <t,t<T)/P(T >s) 

d s N(v)\ d.Ni(u) 



K 

v£(s,u) 



1- 



,Y(v) 



,Y(u) 



(14) 



The estimator in ( |14p is simple: It is just an estimator of the limit of a 
cumulative incidence function as in (j^J), but evaluated in the data subset 
'still in the initial state of X and under observation at time s\ 

Standard competing ris ks arguments can be u sed to derive an estimator 
of the variance of P i(s, £) (JAndersen et all Il993l p. 299), 



varPoi(s,^) : 

J[ 



v£(s,u] 



v£(u,i 

K 

v£(s,u] 



d s N(v)\ 


s Y(v) J 


d s N(v) 


s Y(v) 


d s N(v)\ 



dsNAr) dsNAu) 



,Y(r) 



X u) 



+ 



X v) 



i:j[ 



v£(u,r) 



d s N(v)\ dsNxir)] d s N 2 (u) 



,Y(v) 



,Y(r) 



,Y u] 



where we have also used S N 2 for the counting process of observed events of 
type 2 of the competing risks model k in the data subset of individuals who 
were still in the initial state of X and under observation at time s. This 
variance estimator is motivated by a corresponding asymptotic expression 
flAndersen et all Il993l . p. 321.). 



Theoretically, the new estimator is more efficient than the one of lMeira-Machado et al 



(120061 . Theorem 2). The informal argument is that it uses the full information 
from the subjects whose illness-death process was right censored, whereas the 
Meira-Machado et al. estimator ignores the information in which state the 



9 



subjects were right censored. This can be seen by comparing the weights 
used in the construction of the IPCW estimators (this is were information 
from the censored subjects enters). The new estimator uses the conditional 
weights given in (fl3|) . The first factor of (TT3l) estimates P(C > s | T > s) 
using all censored times that are less than or equal to time s and where the 
corresponding illness-death process is censored in the initial state. The sec- 
ond factor estimates P{C > u\T > s,C > s) using all the censoring times 
that are greater than time s and less than or equal to time u for which the 
corresponding illness-death process is in the initial state and under obser- 
vation at time s. The Meira-Machado et al. estimator uses IPCW weights 
derived from the marginal Kaplan-Meier estimator P(C > u) which uses 
the censoring times but not the state of the illness-death process at the in- 
dividual cen5C4in£_Jime ; _jriiej^are_s^milar results and a general theory for 



IPCW (Ivan der Laan and Robind . 120031 1 which could be used to show for 



mally that Po\(s,t) is asymptotically more efficient as compared to Poi(s,t). 
However, our simulation results and data example show comparable small 
sample performances of both estimates (see Sections @] and ED • 

3.3 Left-truncated data 

So far, we have assumed that observation of the illness-death process is sub- 
ject to random censoring only. We now additionally allow for left-truncation 
(delayed study entry), which can be handle d by the new est i mator P i(s,t) 



because of general competing risks results (j Andersen et all Il993l ). To be 
specific, assume that observation of the random times (T ,T), is subject to 
random left-truncation and right-censorship by (L,C), i.e., we assume that 
the tuples (T ,T) and (L,C) are independent. 

We have to be precise what delayed study entry in this context means, 
because Poi(s,t) is an estimated cumulative incidence function, estimated in 
the data subset 'in the initial state of X and under observation at time s\ 
This entails that only an individual whose left-truncation time L is less than 
its waiting time T in the initial state can enter the calculation. This is 
in contrast to standard nonparametric estimation for a time-inhomogeneous 
Markov model, where an individual may be in any non-absorbing state of 
the model at the time of study entry. 

We now write S Y for the at risk process of the competing risks model k in 
the data subset of individuals whose left-truncation times were less than s and 
who were still in the initial state of X and under observation at time s. We 
analogously interpret S N, S N\ and S N2. We can then profit from th e general 



fact that counting processes naturally account for left-truncation flKeiding 
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19921 ) and estimate P i(s,t) using 



d£(s,m) 



d s JV(<;)\ dsN^u) 



Y(v) J s Y{u) 



At the beginning of the section, we had been forced to assume the support of 
the distribution of T to be contained in that of C, because integrals as on the 
right hand side of the previous display are being evaluated up to oo. We now 
need to additionally account for the presence of left-truncation. Essentially 
what we need to ensure is that the risk set S Y is non-empty on [s, oo) with 
asymptotic probability larger than zero. To be precise, we assume that for 
all u < inf{t> : P (T > v) = 0} there exists a positive function y on [0,u], 
bounded away from zero, such that 

sup \ a Y(v)/ a Y{s+)-y{v)\-+0 

vG\s,u] 



in pr obability as the 'sample size' s Y(s+) goes to infinity ( I Andersen et al. 



199.1 Condition (4.1.16)). 



4 Simulation Study 

We now report results of a limited simulation study, where the aim is to com- 
pare the finite sample performance of our new estimator Poi(s,£) from (IT4"|) 
with the more complicat ed estimator Pm (s, t) fr om ( fTUl) . which is algebraically 



equal to the estimator of lMeira-Machado et all (J2006I ). We also report results 



from using the Aalen-Johansen estimator. 

We simulated data from a scenario used by Meira-Machado et al., which 
these authors found to be challenging both in terms of bias and variance. To 
be specific, we generated replicates of (T ,Xt ) using an exponential hazard 
of 0.039 + 0.026 for simulating To and deciding on Xt = 1 in a binomial 
experiment with probability 0.039/(0.039 + 0.026). If X To = 1, we set T = 
1.7 • T ; as a consequence, the model is not Markov. Random censoring 
was simulated from an exponential distribution with parameters 0.013 or 
0.035. In addition, we also investigated Poi(s,t) when the data were subject 
to both left-truncation and right -censoring. Le ft-truncation was simulated 
from a skew normal distribution ( Azzalinil . Il985l ). with location equal to —5, 



scale equal to 10 and shape equal to 10. Right-censoring was exponentially 
distributed with hazard 0.013. 

We simulated 1000 studies and report the bias (average of the 1000 es- 
timates of Poi(10,t) minus true quantity) and the empirical variance of the 
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Poi(- 


L0,t) 


P O i(10,t) 


Aalen-Johansen 


t 


Bias 


Variance 


Bias 


Variance 


Bias 


Variance 


30 


1.92e-03 


5.07e-03 


1.91e-03 


5.02e-03 


-2.10e-02 


3.92e-03 


40 


4.69e-03 


4.46e-03 


4.74e-03 


4.45e-03 


-7.44e-03 


3.57e-03 


50 


-3.33e-03 


4.44e-03 


-3.21e-03 


4.46e-03 


-5.75e-03 


3.62e-03 


60 


-6.42e-03 


3.86e-03 


-6.35e-03 


3.88e-03 


-3.14e-03 


3.08e-03 


70 


-1.05e-02 


3.05e-03 


-1.05e-02 


3.06e-03 


-2.90e-03 


2.54e-03 


80 


-8.47e-03 


2.39e-03 


-8.49e-03 


2.39e-03 


1.26e-03 


2.17e-03 


90 


-9.61e-03 


1.51e-03 


-9.62e-03 


1.51e-03 


1.71e-03 


1.60e-03 


100 


-7.02e-03 


l.lle-03 


-7.03e-03 


l.lle-03 


5.05e-03 


1.37e-03 



Table 1: Simulation results for censoring hazard 0.013. 

estimates. In the presence of right-censoring only, the sample size in each 
simulated study was 100. With additional left-truncation, the average sample 
size was 85. The true value Poi(lO,t) was numerically approximated based 
on 100 replications of uncensored samples of size 10000 using the usual bi- 
nomial estimator within the data subset defined by 'in state at time 10', 
yielding 



t 


30 


40 


50 


60 


70 


80 


90 


100 


Poi(10,t) 


0.201 


0.162 


0.125 


0.092 


0.067 


0.048 


0.033 


0.023 



Tables [T] and [2] give results for the right-censoring scenarios, table |3] 
displays results for the scenario subject to both left-truncation and right- 
censoring. 

The tables indicate similar performance of both estimators f llOp and (|T4"|) 
in terms of bias and variance and in the presence of right-censoring only. Sim- 
ilar results were found for a sample size of 200 (not shown). Interestingly, 
Tables [1] and [2] find the Aalen-Johansen estimator to perform at least com- 
petitively except for the early time point 30. This is somewhat in contrast to 
the results reported by Meira-Machado et al., who found the Aalen-Johansen 
estimator to be biased in the absence of the Markov property. The reason is 
that these authors considered the absolute bias integrated over time, which 
appears to be dominated by early time points. We find a similar picture 
when comparing the new estimator and the Aalen-Johansen in the presence 
of additional left-truncation. 
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PoiO 


L0,t) 


PoiO 


L0,i) 


Aalen-Johansen 


t 


Bias 


Variance 


Bias 


Variance 


Bias 


Variance 


30 


3.31e-03 


1.28e-02 


2.92e-03 


1.27e-02 


-1.61e-02 


7.27e-03 


40 


-1.14e-02 


1.54e-02 


-1.16e-02 


1.53e-02 


-4.94e-03 


9.52e-03 


50 


-3.35e-02 


1.29e-02 


-3.36e-02 


1.28e-02 


-6.03e-03 


9.48e-03 


60 


-3.78e-02 


8.93e-03 


-3.80e-02 


8.82e-03 


3.41e-03 


8.86e-03 


70 


-4.14e-02 


4.89e-03 


-4.15e-02 


4.87e-03 


9.20e-03 


8.55e-03 


80 


-3.39e-02 


2.78e-03 


-3.39e-02 


2.75e-03 


2.04e-02 


8.36e-03 


90 


-2.75e-02 


1.03e-03 


-2.76e-02 


1.01e-03 


2.82e-02 


7.74e-03 


100 


-2.08e-02 


3.94e-04 


-2.08e-02 


3.86e-04 


3.56e-02 


7.58e-03 



Table 2: Simulation results for censoring hazard 0.035. 





Aalen- Johansen 


Pox(10,t) 


t 


Bias 


Variance 


Bias 


Variance 


30 


-2.17e-02 


4.00e-03 


3.18e-04 


5.41e-03 


40 


-9.38e-03 


4.03e-03 


2.06e-03 


5.24e-03 


50 


-5.30e-03 


3.55e-03 


-1.33e-03 


4.62e-03 


60 


-1.38e-03 


3.05e-03 


-2.79e-03 


4.02e-03 


70 


-4.83e-04 


2.42e-03 


-6.90e-03 


3.02e-03 


80 


1.25e-03 


2.02e-03 


-8.43e-03 


2.28e-03 


90 


2.38e-03 


1.69e-03 


-9.27e-03 


1.59e-03 


100 


3.85e-03 


1.38e-03 


-9.10e-03 


9.97e-04 



Table 3: Simulation results for left truncated data and censoring hazard 0.013 
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5 Real data example 

We use a random subsample of 1313 patients from the SIR3 (.Spread of 
nosocomial Infections and .Resistant pathogens) study that has been mad e 
publicly available as part of the R-package kmi ( Beyersmann et all |2012| ). 



The present analyses may therefore be reproduced. SIR3 was a prospective 
study to assess the occurrence and the impact of hosp ital-acquired infec- 



tions in intensive care. Details are reported elsewhere (jBeversmann et al. 



20061 ). Here, we focus on the occurrence of hospital-acquired pneumonia, 



which is one of the most frequent and most severe nosocomial inf e ctions . In 



an analysis of the full data set of 1876 patients, lAllignol et al.l ( 120111 ) in- 
cluded time of pneumonia as a time- dependent covariate into Cox models for 
the end-of-stay hazards (distinguishing between competing endpoints alive 
discharge and hospital death). Because the hazard ratios were approximately 
equal to one in this informal check of the Markov assumption, these authors 
concluded that one may assume the data to follow a time-inhomogeneous 
Markov model. However, because the confidence intervals were marginal, a 
more robust estimation procedure as in the present paper may be desirable. 

Tables HI [5] and [6] report results on estimating P i(s,t) for s = 3, s = 5 
and s = 7, using both P i(s,t) and Poi(s,t). These estimates are relevant 
for planning hospital resources, estimating the probability of future infected 
intensive care patients among the currently, i.e., at time s uninfected. 

The tables also report variance estimates and 95% confidence intervals 
(CI) computed from 1000 bootstrap samples. We used the bootstrap in or- 
der to have one common method for both Poi(s,t) and pQi(s,t). Section |3] 
has shown that estimating a cumulative incidence function is at the core 
of both P i(s,t) and Poi(s,t), and recent research has investigated different 
proposals for estimating; the variance of an estimated cumulative incidence 



function (IBraun and Yuan! 120071 ; lAllignol et al.l . l2010l ) . Because of our rep 



resentations fTTDT) and (THl) . the functional delta method justifies both use of 
the bootstrap and of a normal limit. The tables report CIs both using the 
25th and 75th quantiles of the bootstrap estimates distribution and using a 
normal approximation. Similar to the simulation study in Section HJ we find 
that Poi(s,t) and Poi(s,t) perform comparably. 

Finally, Table [7] displays the point estimates Poi(s,t) together with the 
corresponding Aalen-Johansen estimates. Both estimators yield similar re- 
sults. 
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New estimator 



Meira-Machado estimator 



t 


Poi(s,t) 


Variance 


Bootstrap CI 


Normal CI 


Poi(s,t) 


Variance 


Bootstrap CI 


Normal CI 


5 


0.0234 


1.95e- 


-05 


[0.0152; 0.0323] 


[0.0147; 0.032] 


0.0255 


1.95e- 


-05 


[0.0168; 0.0352] 


[0.0162 


0.0347] 


6 


0.0314 


2.45e - 


-05 


[0.0219; 0.0413] 


[0.0217 


0.0411] 


0.0342 


2.45e - 


-05 


[0.0244; 0.046] 


[0.0236 


0.0448] 


7 


0.0363 


2.82e- 


-05 


[0.0258; 0.0469] 


[0.0258 


0.0467] 


0.0395 


2.82e- 


-05 


[0.0286; 0.0517] 


[0.0282 


0.0507] 


8 


0.0396 


3.17e- 


-05 


[0.0288; 0.051] 


[0.0285 


0.0506] 


0.0431 


3.17e- 


-05 


[0.0315; 0.056] 


[0.0313 


0.0549] 


9 


0.0452 


3.57e - 


-05 


[0.034; 0.0574] 


[0.0335 


0.0569] 


0.0492 


3.57e - 


-05 


[0.0376 


0.0629] 


[0.0366 


0.0618] 


10 


0.0476 


3.76e - 


-05 


[0.0361; 0.0596] 


[0.0356 


0.0596] 


0.0518 


3.76e - 


-05 


[0.0392 


0.0655] 


[0.0387 


0.0649] 


11 


0.0502 


4.04e - 


-05 


[0.0379; 0.0631] 


[0.0377 


0.0627] 


0.0547 


4.04e - 


-05 


[0.0414 


0.0677] 


[0.0414 


0.0679] 


12 


0.0512 


4.04e - 


-05 


[0.0388; 0.0637] 


[0.0387 


0.0636] 


0.0557 


4.04e - 


-05 


[0.0432 


0.0695] 


[0.0424 


0.0691] 


13 


0.0520 


4.25e - 


-05 


[0.0393; 0.0642] 


[0.0392 


0.0648] 


0.0566 


4.25e - 


-05 


[0.0441 


0.0708] 


[0.0432; 0.07] 


14 


0.0552 


4.42e - 


-05 


[0.0426; 0.068] 


[0.0422 


0.0683] 


0.0601 


4.42e - 


-05 


[0.0471 


0.0747] 


[0.0464; 0.0739] 


15 


0.0545 


4.31e- 


-05 


[0.0413; 0.0669] 


[0.0416 


0.0673] 


0.0593 


4.31e- 


-05 


[0.0468 


0.0739] 


[0.0456; 0.073] 


20 


0.0452 


3.68e- 


-05 


[0.0336; 0.0566] 


[0.0333 


0.0571] 


0.0492 


3.68e- 


-05 


[0.037 


0.0632] 


[0.0365; 0.062] 


30 


0.0258 


2.09e- 


-05 


[0.0174; 0.0346] 


[0.0168 


0.0347] 


0.0280 


2.09e- 


-05 


[0.0191 


0.0391] 


[0.018 


0.0381] 


10 


0.0176 


1.60e- 


-05 


[0.0101; 0.0256] 


[0.0098 


0.0254] 


0.0192 


1.60e- 


-05 


[0.0115; 0.028] 


[0.0108 


0.0275] 


50 


0.0100 


9.03e - 


-06 


[0.0045; 0.0163] 


[0.0042 


0.0159] 


0.0109 


9.03e - 


-06 


[0.0055; 0.0179] 


[0.0046 


0.0173] 



Table 4: Estimate of Poi(s,t), s = 3 using the new estimator and Meira-Machado estimator, along with bootstrap 
95% CIs and CIs based on normal approximation 



New estimator 



Meira-Machado estimator 



t Poi(s,t) Variance Bootstrap CI Normal CI Poi(s,t) Variance Bootstrap CI Normal CI 



o 



7 
8 
9 

10 
11 
12 
13 
14 
15 
20 
30 
10 
50 



0.0167 
0.0208 
0.0286 
0.0325 
0.0357 
0.0379 
0.0398 
0.0438 
0.0438 
0.0402 
0.0233 
0.0174 
0.0102 



1.60e-05 
1.95e-05 
2.62e-05 
2.99e-05 
3.22e - 05 
3.41e-05 
3.56e-05 
4.10e-05 
4.17e-05 
4.01e-05 
2.43e - 05 
1.88e-05 
1.16e-05 



[0.0091 
[0.0119 
[0.0187 
[0.0213 
[0.0241 
[0.0262 
[0.0278 



0.0243] 
0.0296] 
0.0384] 
0.0435] 
0.0471] 
0.0494] 
0.0512] 



[0.0309; 0.056] 



[0.0307 
[0.0277 
[0.0139 
[0.0096 
[0.0042 



0.0561] 
0.0529] 
0.0336] 
0.0264] 

0.0175] 



[0.0089 
[0.0121 
[0.0186 
[0.0218 
[0.0246 
[0.0264 
[0.0281 
[0.0312 
[0.0311 
[0.0278 
[0.0136 
[0.0088 
[0.0035 



0.0246] 
0.0294] 
0.0386] 
0.0432] 
0.0468] 
0.0493] 
0.0515] 
0.0563] 
0.0565] 
0.0526] 
0.0329] 
0.0259] 
0.0168] 



0.0190 
0.0236 
0.0324 
0.0369 
0.0405 
0.0430 
0.0452 
0.0497 
0.0497 
0.0456 
0.0264 
0.0196 
0.0115 



1.60e-05 

1.95e-05 
2.62e-05 
2.99e-05 
3.22e-05 
3.41e-05 
3.56e-05 
4.10e-05 
4.17e-05 
4.01e-05 
2.43e - 05 
1.88e-05 
1.16e-05 



[0.0108 

[0.0143 
[0.0217 
[0.0255 
[0.0281 
[0.0308 
[0.0325 
[0.036 
[0.0363 
[0.0324 
[0.0157 
[0.0109 
[0.0049 



0.0281] 
0.0336] 
0.0443] 
0.0498] 
0.0535] 
0.0559] 
0.0582] 
0.0644] 
0.0633] 
0.0593] 
0.0374] 
0.0304] 
0.0195] 



[0.0101; 0.0278] 

[0.0137; 0.0334] 

[0.021; 0.0438] 

[0.0248; 0.049] 

[0.0279; 0.0531] 

[0.0301; 0.0558] 

[0.0321; 0.0583] 

[0.0361; 0.0633] 

[0.0359; 0.0635] 

[0.032; 0.0593] 

[0.0153; 0.0374] 

[0.0101; 0.0292] 

[0.0042; 0.0187] 



Table 5: Estimate of Poi(s,t), s = 5 using the new estimator and Meira-Machado estimator, along with bootstrap 
95% CIs and CIs based on normal approximation 



New estimator 



Meira-Machado estimator 



t P i(s,t) Variance Bootstrap CI Normal CI Poi(s,t) Variance Bootstrap CI Normal CI 



-j 



9 

10 
11 
12 
13 
14 
15 
20 
30 
40 
50 



0.0165 
0.0215 
0.0269 
0.0297 
0.0334 
0.0385 
0.0398 
0.0364 
0.0245 
0.0209 
0.0130 



2.02e - 05 
2.63e-05 
3.33e-05 
3.61e-05 
4.16e-05 
4.92e - 05 
5.12e-05 
4.70e - 05 
3.28e - 05 
2.72e-05 
1.77e-05 



[0.0087 
[0.0119 
[0.0167 
[0.0195 
[0.0218 
[0.0257 
[0.0267 
[0.0229 
[0.0139 
[0.0111 
[0.0057 



0.0266] 


[0.0077 


0.0329] 


[0.0115 


0.0398] 


[0.0156 


0.0438] 


[0.0179 


0.0478] 


[0.0208 


0.0546] 


[0.0248 


0.0554] 


[0.0258 


0.0514] 


[0.0229 


0.0375] 


[0.0133 


0.0321] 


[0.0107 


0.0222] 


[0.0048 



0.0253] 
0.0316] 
0.0382] 
0.0414] 
0.0461] 
0.0523] 
0.0538] 
0.0498] 
0.0358] 
0.0311] 
0.0212] 



0.0192 
0.0251 
0.0313 
0.0345 
0.0389 
0.0448 
0.0463 
0.0424 
0.0287 
0.0244 
0.0152 



2.02e-05 
2.63e-05 
3.33e-05 
3.61e-05 
4.16e-05 
4.92e - 05 
5.12e-05 
4.70e - 05 
3.28e-05 
2.72e - 05 
1.77e-05 



[0.01 
[0.0139 
[0.0186 
[0.0218 
[0.0248 
[0.0301 
[0.0311 



0.0304] 
0.0381] 
0.0459] 
0.0494] 
0.0546] 
0.0617] 
0.0625] 



[0.0288; 0.058] 
[0.0166; 0.0424] 
[0.0135; 0.0383] 
[0.0061; 0.0259] 



[0.0087 
[0.013 
[0.0178 
[0.0206 
[0.024 
[0.0293 
[0.0309 
[0.0275 
[0.0161 
[0.0121 



0.0297] 
0.0371] 
0.0447] 
0.0484] 
0.0538] 
0.0604] 
0.0617] 
0.0573] 
0.0413] 
0.0367] 



[0.0053; 0.025] 



Table 6: Estimate of Poi(s,t), s = 7 using the new estimator and Meira-Machado estimator, along with bootstrap 
95% CIs and CIs based on normal approximation 



t 


Poi(3,t) 


Aalen- 
Johansen 


Poi(5,t) 


Aalen- 
Johansen 


Poi(7,t) 


Aalen- 
Johansen 


5 


0.0234 


0.0266 










6 


0.0314 


0.0359 










7 


0.0363 


0.0411 


0.0167 


0.0200 






8 


0.0396 


0.0446 


0.0208 


0.0250 






9 


0.0452 


0.0515 


0.0286 


0.0343 


0.0165 


0.01987 


10 


0.0476 


0.0533 


0.0325 


0.0376 


0.0215 


0.02498 


11 


0.0502 


0.0559 


0.0357 


0.0419 


0.0269 


0.03141 


12 


0.0512 


0.0569 


0.0379 


0.0440 


0.0297 


0.03481 


13 


0.0520 


0.0578 


0.0398 


0.0460 


0.0334 


0.03813 


14 


0.0552 


0.0612 


0.0438 


0.0503 


0.0385 


0.04389 


15 


0.0545 


0.0605 


0.0438 


0.0505 


0.0398 


0.04503 


20 


0.0452 


0.0509 


0.0402 


0.0445 


0.0364 


0.04218 


30 


0.0258 


0.0292 


0.0233 


0.0270 


0.0245 


0.02726 


40 


0.0176 


0.0204 


0.0174 


0.0196 


0.0209 


0.02061 


50 


0.0100 


0.0115 


0.0102 


0.0111 


0.0130 


0.01165 



Table 7: Point estimates P i(s,t) as in Tables 0H6] and corresponding Aalen- 
Johansen estimates. 
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6 Discussion 

We have demonstrated how to use competing risks techniques for estimating 
transition probabilities in a non-Markov illness-death model without recov- 
ery. For ease of presentation, we have focused on estimating P 01 (s,t). Our 
first estimator, -P m (s, t ) from (1101) . is algebraically equal to the estimator of 



Meira-Machado et al.l (120061 ) who derived it using Kaplan-Meier integrals. 
We have also given a new IPCW representation of the estimator, which 
we have then used to find a computationally simpler estimator, P i(s,t) 
from (TT4|) . 

To discuss the intrinsic properties of the proposed estimators, it is useful 
to consider the special case where the process is fully observed for all cases 
(uncensored data). Then, transition probabilities can be consistently esti- 
mated by ratios of crude counts also when the process is non-Markov. In 
fact, for uncensored data both P i(s,t) and Poi(s,t) reduce to 

Er = ii{^ (i) = o,^ (i) = i} (15) 

where the superscript (i) indicates the zth replicate of n i.i.d. copies of the 
multistate process. This is in analogy to many estimators of the state oc- 
cupation probabilities which reduce to the usual multinomial estimators for 
complete data. In (|15|) . each individual contributes with equal weight 1/n to 
the sum in the nominator and in the denominator. 

For right-censored data, the status of the process is unknown after the 
individual end of study time. From an IPCW perspective, the idea underly- 
ing Pqi(s, t) is to restrict the summation in (1151) to the individuals not lost to 
follow-up before time t and to re-weight their contributions by the probabil- 
ity of not being lost to follow-up. The weights are based on a Kaplan- Meier 
estimate of the censoring distribution using the censored observations of T, 
see (1T2|). 

However, some individuals will be lost to follow-up in the initial state 
and others in the disease state. This information is not used by Poi(s,t), 
but P 01 (s,t) uses such information, see ([13]) . Theoretically, P i(s,t) is there- 
fore more efficient, but the simulation results and the practical data example 
found comparable performance. The practical advantage of Poi(s,t) is that 
it is computationally simpler. 

A further advantage of Poi(s,t) is that, being an Aalen-Johansen esti- 
mator of the limit of a certain cumulative incidence function, it gives direct 
access to competing risks methodology, as we have demonstrated by also 
allowing for left-truncated data. In the context of hospital-acquired infec- 
tions, such a delayed study entry may arise if patients are not followed since 
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admission but conditional on detection of an inf ectious organism such as 
Methicillin- Resistant Staphylococcus Aureus as in |Pe Angelis et al.l (120111 ) . 
So far, a drawba ck of the estimation procedu res as outlined both in the 
present paper and in Meira-Machado et al.l ( 120061 ) is that we require the sup- 
port of the distribution of T to be contained in the support of the distribution 
of C in order to be able to estimate the limit of a cumulative incidence func- 
tion, see ([7]). This is not a restriction for our motivating data situation, but 
the assumption is often not fulfilled in other medical application s. The prob- 



l em ca n be circumvented by 'artificial censoring' black as, e.g., in lQuale et al 
(|2006h . 



To be specific, consider the fixed, but arbitrary time pair s < t and assume 
that s, t < ini{v : P (C > v ) = 0}. Then there is a r > t with P(C > r) > 0. 
The idea is to consider the modified random variables (min(T , r), min(T, r)) 
instead of (T ,T). Their distributions coincide on [0,r) x [0,r), which in- 
cludes the bivariate time point of interest (s,t), and min(T, r) is less than 
inf {v : P (C > v ) = 0} by construction. We can then use the estimation 
techniques as outlined earlier, but using the modified data. Note that the 
data do change. E.g., if observation of T is censored after the chosen r, the 
modified variable min(T, r) has been observed. 

Finally, our limited simulation study indicated that the Aalen-Johansen 
estimator may competitively estimate transition probabilities in small sam- 
ples even i n the absence of the Markov property. This is not unlike the 
findings of iGunnes et all ( 120071 ) for estimating state occupation probabili- 
ties. 



Appendix 

The aim of the appendix is to show that our initial estimation procedure 
based on t he competing risks process k is algebraically identical with the 
proposal of iMeira-Machado et al.l (120061 ) . The idea of their estimator is to 
consider T Q as a covariate for the event time T and to use Stute's estimator 



for a Kaplan- Meier integral with a covariate ((StuteJ, Il993l ). 

For the purpose of comparison, note that the formulation of Meira-Machado 
et al. is based on latent transition times between the states of the illness-death 
model. These authors then consider censored variants of such latent times, 
provided they are observable. Meira-Machado et al. then arrive at censored 
variants of (T ,T), which will be our starting point. Also note that because 
T will be considered as a covariate for a Kaplan-Meier integral with respect 
to T, we will only need an event indicator for the latter. This will further 
simplify the notation. We will also use that To has been observed, if T has 
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been observed, because T < T. 

State's method requires that the parameter of interest can be formulated 
as an integral with respect to the joint distribution of (T , T), 

Again focussing on Poi(s,t) for ease of presentation, the Meira-Machado et 
al. estimator relies on estimating the above display for <fr(z,y) = l(s < z < 
t,t<y). 

Assume n i.i.d. data (T 0i ,Ti,^i), % = 1, . . .n, where the tilde indicates a 
censored observation, e.g., Tj = min(Tj,Cj), & is the event indicator l(Tj < 
Ci), and the index i indicates the ith individual. Stute's method (and the 
estimator of Meira-Machado et al.) is based on the ordered data TVn < 
• • • < T(n) with (^[i], T [j]) att ached to Tgy Again for ease of presentation, we 



assume no ties in the data; IStutd (119931 ) discusses how to arbitrarily break 



ties if present. Note that our formulation of the estimators does allow for 
ties. 

The Meira-Machado et al. estimator of P(s < Tq < t, t < T) is 

En ( i - - J} hi) ^l 0(W(j)) . 

*-^ - LJ -\ n — ] + 1 J n — i + 1 

i=i j=\ v j / 

Using the counting process notation introduced earlier, the above display 
equals 

AJV(f M )\ AJV(f w ; 




We note two things about the last display: Firstly, because the sum runs 
over all individuals and because addition and multiplication are each com- 
mutative, ordering is not needed. Secondly, if AN(Ti) = 1, then % = T t 
and T 0i = T 0i . Hence, we have AiV(Tj) • 0(T Oi ,Tj) = AAq(Tj). As a conse- 
quence, the Meira-Machado et al. estimator of P(s < T < t,t < T) equals 
our competing risks-type estimator (jHJ) and hence our estimator ffTUl) equals 
their estimator of Pqi(s, t). 
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